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Abstract 



■ This paper addresses two questions in the context of neuronal networks dynamics, using methods 

from dynamical systems theory and statistical physics: (i) How to characterize the statistical prop- 
erties of sequences of action potentials ("spike trains") produced by neuronal networks ? and; (ii) 
what are the effects of synaptic plasticity on these statistics ? We introduce a framework in which 
spike trains are associated to a coding of membrane potential trajectories, and actually, constitute a 
symbolic coding in important explicit examples (the so-called gIF models). On this basis, we use the 
CO ' thermodynamic formalism from ergodic theory to show how Gibbs distributions are natural proba- 

^ ' bility measures to describe the statistics of spike trains, given the empirical averages of prescribed 

quantities. As a second result, we show that Gibbs distributions naturally arise when considering 
' "slow" synaptic plasticity rules where the characteristic time for synapse adaptation is quite longer 

, than the characteristic time for neurons dynamics. 

Keywords Neurons dynamics, spike coding, statistical physics, Gibbs distributions. Thermodynamic 
T-H formalism. 
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1 Introduction. 

Spike trains as a "neural code". Neurons activity results from complex and nonlinear mechanisms 
(Hodgkin & Huxley, 1952; Cronin, 1987; Dayan & Abbott, 2001; Gerstner & Kistler, 20G2b), leading to 
a wide variety of dynamical behaviours (Cronin, 1987; Guckenheimer & Labouriau, 1993). This activity 
is revealed by the emission of action potentials or "spikes" . While the shape of an action potential is 
essentially always the same for a given neuron, the succession of spikes emitted by this neuron can have 
a wide variety of patterns (isolated spikes, periodic spiking, bursting, tonic spiking, tonic bursting, etc 
. . . ) (Izhikevich, 2004; Brette & Gerstner, 2005; Touboul, 2008), depending on physiological parameters, 
but also on excitations coming either from other neurons or from external inputs. Thus, it seems natural 
to consider spikes as "information quanta" or "bits" and to seek the information exchanged by neurons 
in the structure of spike trains. Doing this, one switches from the description of neurons in terms of 
membrane potential dynamics, to a description in terms of spike trains. This point of view is used, in 
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experiments, by the analysis of raster plots, i.e. the activity of a neuron is represented by a mere vertical 
bar each time this neuron emits a spike. Though this change of description raises many questions, it is 
commonly admitted in the computational neuroscience community that spike trains constitute a "neural 
code". 

This raises however other questions. How is "information" encoded in a spike train: rate coding 
(Adrian & Zotterman, 1926), temporal coding (Theunissen & Miller, 1995), rank coding (Perrinet, De- 
lorme, Samuelidcs, & Thorpe. 2001; Delorme, Perrinet, & Thorpe, 2001), correlation coding (Johnson, 
1980) ? How to measure the information content of a spike train ? There is a wide literature dealing with 
these questions (Nirenberg & Latham, 2003; D. Johnson, 2004; Barbieri et al., 2004; Nemenman, Lewen, 
Bialck, & Steveninck, 2006; Arabzadeh, Panzeri, & Diamond. 2006; Sinanovic & Johnson, 2006; Gao, 
Kontoyiannis, & Bienenstock, 2008; Osbone, Palmer, Lisberger, & Bialek, 2008), which are inherently 
related to the notion of statistical characterizations of spike trains, see (Rieke, Warland, Steveninck, & 
Bialek, 1996; Dayan & Abbott. 2001; Gerstner & Kistlcr, 2002b) and references therein for a review. As 
a matter of fact, a prior to handle "information" in a spike train is the definition of a suitable probability 
distribution that matches the empirical averages obtained from measures. Thus, in some sense that we 
make precise in this paper, the choice of a set of quantities to measure (observables) constrains the form 
of the probability characterizing the statistics of spike trains. 

As a consequence, there is currently a wide debate on the canonical form of these probabilities. While 
Poisson statistics, based on the mere knowledge of frequency rates, are commonly used with some success 
in biological experiments (Georgopoulos, Kalaska, Caminiti, & Massey, 1982; Georgeopoulos, Merchant, 
Naselaris, & Amirikian, 2007), other investigation evidenced the role of spikes coincidence or correlations 
(Grammont & Riehle, 1999, 2003) and some people have proposed non Poisson probabilities, such as 
Ising-like Gibbs distributions, to interpret their data (Schneidman, Berry, Segev, & Bialek, 2006; Tkaeik, 
Schneidman, Berry, & Bialek, 2006). It is important to note here that beyond the determination of the 
"right" statistical model there is the aim of identifying which type of information is used by the brain to 
interpret the spike trains that it receives, coming from different sources. As an example choosing a model 
where only frequency rates matters amounts to assuming that spikes coming from different neurons are 
essentially treated independently. 

However, determining the form of the probability characterizing the statistics of spike trains is ex- 
tremely difficult in real experiments. In the present paper, we focus on neural networks models considered 
as dynamical systems, with a good mathematical and numerical control on dynamics. In this context 
we argue that Gibbs distributions are indeed natural candidates whenever a set of quantities to measure 
(observables) has been prescribed. As a matter of fact Poisson distributions and Ising-like distributions 
are specific examples, but, certainly, do not constitute the general case. 

Synaptic plasticity. The notion of neural code and information cannot be separated from the capacity 

of neuronal networks to evolve and adapt by plasticity mechanisms, and especially synaptic plasticity. 
The latter occurs at many levels of organization and time scales in the nervous system (Bienenstock, 
Cooper, & Munroe, 1982). It is of course involved in memory and learning mechanisms, but it also alters 
excitability of brain area and regulates behavioural states (e.g. transition between sleep and wakeful 
activity). Therefore, understanding the effects of synaptic plasticity on neurons dynamics is a crucial 
challenge. On experimental grounds, different synaptic plasticity mechanisms have been exhibited from 
the Hebbian's ones (Hebb, 1949) to Long Term Potentiation (LTP) and Long Term Depression (LTD), 
and more recently to Spike Time Dependent Plasticity (STDP) (Markram, Liibke, Frotscher, & Sakmann, 
1997; Bi & Poo, 2001) (see (Dayan & Abbott, 2001; Gerstner & Kistler, 2002a; Cooper, Intrator, Blais, 



2 



& Shouval, 2004) for a review). Modeling these mechanisms requires both a bottom- up and top-down 
approach. 

This issue is tackled, on theoretical grounds, by inferring "synaptic updates rules" or "learning rules" 

from biological observations (Malsburg, 1973; Bicncnstock ct al., 1982; Miller, Keller, & Stryker, 1989) 
and extrapolating, by theoretical or numerical investigations, the effects of such synaptic rule on such 
neural network model. This bottom-up approach relies on the belief that there are "canonical neural 
models" and "canonical plasticity rules" capturing the most essential features of biology. Unfortunately, 
this results in a plethora of canonical "candidates" and a huge number of papers and controversies. In an 
attempt to clarify and unify the overall vision, some researchers have proposed to associate learning rules 
and their dynamical effects to general principles, and especially to "variational" or "optimality" principles, 
where some functional has to be maximised or minimised (Dayan & Hausser, 2004; Rao & Sejnowski, 
1991, 2001; Bohte & Mozer, 2007; Chechik, 2003; Toyoizumi, Pfister, Aihara, & Gerstner, 2005, 2007). 
Therefore, in these "top-down" approaches, plasticity rules "emerge" from first principles. Unfortimately, 
in most examples, the validations of these theories has been restricted to considering isolated neurons 
submitted to input spike trains with ad hoc statistics (typically, Poisson distributed with independent 
spikes (Toyoizumi et al., 2005, 2007)). 

However, addressing the effect of synaptic plasticity in neuronal networks where dynamics is emerging 
from collective effects and where spikes statistics are constrained by this dynamics seems to be of central 
importance. This is the point of view raised in the present paper, where, again, we focus on models, 
which are simplifications of real neurons. Even in this case, this issue is subject to two main difficulties. 
On one hand, one must identify the generic collective dynamical regimes displayed by the model for 
different choices of parameters (including synaptic weights). On the other hand, one must analyse the 
effects of varying synaptic weights when applying plasticity rules. This requires to handle a complex 
interwoven evolution where neurons dynamics depends on synapses and synapses evolution depends on 
neuron dynamics. The first aspect has been addressed by several authors using mean- field approaches 
(see e.g. (Samuelides & Cessac, 2007) and references therein), "Markovian approaches" (Soula & Chow, 

2007) , or dynamical system theory (see (Cessac & Samuelides, 2007) and references therein). The second 
aspect has, up to our knowledge, been investigated theoretically in only a few examples with Hebbian 
learning (Dauce, Quoy, Cessac, Doyon, & Samuehdes, 1998; Siri, Berry, Cessac, Delord, & Quoy, 2007, 

2008) or discrete time Integrate and Fire models with an STDP like rule (Soula, 2005; Soula, Beslon, & 
Mazet, 2006) and is further addressed here. 

What the paper is about. To summarize the previous discussion the study of neuronal networks at 
the current stage is submitted to two central questions: 

• How to characterize the statistics of spike trains in a network of neurons ? 

• How to characterize the effects of synaptic plasticity on this network dynamics and especially on 
spike trains statistics ? 

In this paper we suggest that these two questions are closely entangled and must both be addressed 
in the same setting. In order to have a good control on the mathematics we mainly consider the case 
of neural networks models where one has a full characterization of the generic dynamics (Cessac, 2008; 
Cessac & Vieville, 2008). Thus, our aim is not to provide general statements about biological neural 
networks. We simply want to have a good mathematical control of what is going on in specific models, 
with the hope that this analysis should shed some light on what happens (or does not happen) in "real 
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world" neural systems. However, though part of the results used here arc rigorous, this work relics also 
on "working assumptions" that we have not been able to check rigorously. These assumptions, which are 
essentially used to apply the standard theorems in ergodic theory and thermodynamic formalism, provide 
a logical chain which drives us to important conclusions that could be checked in experimental data. 

The paper is organised as follows. In section 2 we introduce a framework in which spike trains are 
associated to a coding of membrane potential trajectories, and actually, constitute a symbolic coding 
in explicit examples. On this basis, we show how Gibbs distributions arc natural probability measures 
to describe the statistics of spike trains, given the data of known empirical averages. Several authors 
have discussed the relevance of Gibbs distribution in the context of Hopfield model (see (Amit, 1989) 
and references therein) and more recently to spike trains (Wood, Roth, & Black, 2006; Kang & Amari, 
2008). A breakthrough has been made in (Schneidman et al., 2006; Tkacik et al., 2006). Actually, our 
approach has been greatly influenced by these two papers, though our results hold in a wider context. In 
section 3, wc discuss the effect of synaptic adaptation when considering "slow" synaptic plasticity rules 
where the characteristic time for synapse adaptation is quite a bit longer that the characteristic time for 
neurons dynamics. These rules are formulated in the context of thermodynamic formalism, where we 
introduce a functional, closely related to thermodynamic potentials like free energy in statistical physics, 
and called "topological pressure" in ergodic theory. In this setting we show that the variations of synaptic 
weights leads to variation of the topological pressure that can be smooth ("regular periods"), or singular 
("phase transitions"). Phase transitions are in particular associated to a change of "grammar" inducing 
modifications in the set of spike trains that the dynamics is able to produce. We exhibit a functional, 
closely related to the topological pressure, that decreases during regular periods. As a consequence, 
when the synaptic weights converge to a fixed value, this functional reaches a minimum. This minimum 
corresponds to a situation where spike trains statistics are characterized by a Gibbs distribution, whose 
potential can be explicitly written. An example with numerical computations is presented in section 4. 

2 Neuron dynamics. 
2.1 Neural state. 

We consider a set of N neurons. Each neuron i is characterized by its state, Xi, which belongs to some 
compact set T e R*^. M is the number of variables characterizing the state of one neuron (we assume 
that all neurons are described by the same number of variables). A typical example is an integrate and fire 
model where M = 1 and Xi — Vi is the membrane potential of neuron i and T = [Vmin, Vmax] (see section 
2.5). Other examples are provided by conductances based models of Hodgkin- Huxley type^ (Hodgkin & 
Huxley, 1952). Then Xi = {Vi,mi,ni, hi) where mj, rij are respectively the activation variable for Sodium 
and Potassium channels and hi is the inactivation variable for the Sodium channel. 

The evolution of these N neurons is given by a deterministic dynamical system of form: 

X(i + 1) = [X(t)] (1) 

where X = {Xi}-^^ represents the dynamical state of a network of N neurons at time t, while time is 
discrete (for a discussion on time discretisation in spiking neural networks see (Cessac & Vieville, 2008)). 
Thus X G M = X^ where M is the phase space of (1), and F-y{M) C M. The map : A4 ^ A4 

^Note that Hodgkin-Huxley equations axe differential equations, while (1) corresponds to a discrete time evolution. We 
assume that we have discretized time with a time scale that can be arbitrary small. A detailed discussion on these aspects 
for conductance based IF models has been presented in (Cessac & Vieville, 2008). Some further comments are given below. 
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depends on a set of parameters 7 G R^. The typical case considered here is 7 = 

^yy j(ext)^ where W 

is the matrix of synaptic weights and I^'^^*) is some external current, assumed to be independent of time 
in the present paper (see section 2.5 for two explicit examples). Thus 7 is a point in a P = N"^ + N 
dimensional space of control parameters. 

2.2 Natural partition. 

Neurons are excitable systems. Namely, neuron i "fires" (emits a spike or action potential), whenever its 
state Xi belongs to some connected region Vi of its phase space. Otherwise, it is quiescent {X e Vo = 
T\Vi). In Integrate and Fire models neuron i fires whenever its membrane potential Vi exceeds some 
threshold 9. In this case, the corresponding region is Vi = [9,Vmax]- In Fitzhugh-Nagumo (FitzHugh, 
1955, 1961; Nagumo, Arimoto, & Yoshizawa, 1962) or Hodgkin-Huxley model (Hodgkin & Huxley, 1952), 
the firing corresponds to the crossing of a manifold called the threshold separatrix (Cronin, 1987) which 
separates the phase space into two connected regions Vq and Vi. For N identical neurons this leads to 
a "natural partition" V of the product phase space M. Call A = {0,1}^, uj = {oJi)^^i G A. Then, 
V = {Vi^}^^^, where V^^ = V^^ x V^j^ x • • • x V^^^. Equivalently, if X e T'jj, all neurons such that lJi — I 
are firing while neurons such that o^j, = are quiescent. We call therefore u) a "spiking pattern". 

2.3 Raster plots. 

To each initial condition X G we associate a "raster plot" u ~ {<^(i)}^L^ such that X(t) G 'Pui(t),^i > 
0. We write X ^ w. Thus, uj is the sequence of spiking patterns displayed by the neural network 
when prepared with the initial condition X. On the other way round, we say that an infinite sequence 
Lo = {i^{t)}^^ is an admissible raster plot if there exists X G such that X ^ lj. We call E-^ the set 
of admissible raster plots for the set of parameters 7. 
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Fi gur6 \\ The phase space (here represented for N = 2 neurons with constant threshold) is partitioned in 2^ parts 
Vuj = YliLi ^i^i) ^ being a spiking pattern. In this way, one associates naturally to an orbit of (1) a raster plot. 
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The dynamics (1) induce a dynamics on the set of raster plot in the following way: 

X ^ F^(X) 

L I 

In this construction a~f, called the "left shift", shifts the raster plot left- wise at each time step of the 
dynamics. Thus, in some sense, raster plots provide a code for the orbits of (1). But, the correspondence 
may not be one-to-one. That is why we use the notation ^ instead of — >. 

Let us introduce the following notation. If we are interested in a prescribed sequence of spiking 
patterns u){s), . . .u>{t), from time s to time t, we denote by [w]^ j, the set of raster plots whose spiking 
patterns from time s to time t match this sequence (cylinder set). 

2.4 Asymptotic dynamics. 

We are here mainly interested in the asymptotic behavior of (1) and set a few notations and notions. 
The w-limit set, il, is the set of accumulation points of F^(A1). Since M is closed and invariant, we 
have fl = Ht^o F^y(A^). In dissipative systems (i.e. a volume element in the phase space is dynamically 
contracted), the w-limit set typically contains the attractors of the system. A compact set ^ G is 
called an attractor for F-y, if there exists a neighborhood U oi A and a time n > such that F^ {U) C U 

oo 

and .4 = Pi Ft^{U). In all examples considered in the present paper F-y, is dissipative and the phase 

t=o 

space is divided into finitely many attraction basins each of them containing an attractor (see Fig. 2). 
Simple examples of attractors are stable fixed points, or stable periodic orbits. More complex attractors 
such as chaotic attractors can be encountered as well. The attraction basins and attractors change when 
the parameters 7 vary. These changes can be smooth (structural stability) or sharp (typically this arises 
at bifurcations points). 

2.5 A generic example: generalized integrate and fire models 
2.5.1 Quiescent stage of the neuron. 

Among the various models of neural dynamics, generalized integrate and fire models play a central role, 
due to their (relative) simplicity, while the biological plausibility is well accepted (Rudolph & Destexhe, 
2006; Gerstner & Kistler, 2002a). A representative class of such models is of form: 

^ = -1 (14 - E^) + it'^ - zi«-")(y., t, (2) 

defining the evolution of the membrane potential Vfe of neuron k. Here tl = RC ~ 10 — 20ms is the 
membrane time-constant related to the membrane resistance and its electric capacity, while El — —9>QmV 

is the related reversal potential. The term is an external "current"^ assumed to be time constant in 
this paper. In the end of this section we shall however consider the case where some noise is superimposed 

■^This is a slight abuse of language since we have divided eq. (2) by the membrane capacity. 
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Figure 2: Schematic illustration of the attractor landscape for neural network models. [A] The phase space is partitioned 
into bounded domains Bi and for each initial condition in Bi the initial trajectory is attracted toward an attractor (e.g. a 
fixed point, a periodic orbit or a more complex attrax;tor) Ai ■ [B] If the parameters (external current, weights) change, the 
landscape is modified and several phenomena can occur: change in the basins shape, number of attractors, modification of 
the attractor as for ^3 in this example; A point belonging to Ai in Fig. 2 A, can, after modification of the parameters, 
converge either to attractor A'2 or A'^ 



upon the constant external current, is the n-th firing time^ of neuron j and |ij | is the list of 

firing times of all neurons up to time t. 
The synaptic currents reads: 

^'^\yk,t, {tf}^) = {V, - E+) Y^mit, {tf}^) + {V, - E-) Y^mit, {tf}^), 

where are reversal potential (typically E~^ ~ OmV and E~ ~ —75mV). E and X refers respectively to 

excitatory and inhibitory neurons and the + (— ) sign is relative to excitatory (inhibitory) synapses. Note 
that conductances are always positive thus the sign of the post-synaptic potential (PSP) is determined 
by the reversal potentials E'^. At rest (14 ~ — 70my) the + term leads to a positive PSP while — leads 
to a negative PSP. 

Conductances depend on past spikes via the relation: 

Mj(t,V) 
n=l 

In this equation, Mj{t,Y) is the number of times neuron j has fired at time t and the list of 

firing times of all neurons up to time t. Gkj are positive constants, proportional to the synaptic efficacy: 

r Wkj = E+Gkj if j e £, , . 

\ Wkj = E-Gkj if j G I. ^""^ 

^For continuous time systems, the firing times of neuron k, for the trajectory V, is defined by: 

4")(v) = inf {t \t > 4"-^)(v), Vk{t) > e} , 

where t^^^ = —00. 
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Wc use the convention Wkj = if there is no synapse from j to k. Finally, a. represents the un- 
weighted shape (called a a-shape) of the post-synaptic potentials. Known examples of a-shapes are 
a{t) = Ke-*/''H{t) or a{t) = Kte'^/'' H{t), where H is the Heaviside function. 

Then, we may write (2) in the form (see (Cessac & Vieville, 2008) for more details): 



dVk ^ . 
+ 9kVk= Ik, 



with: 

9k[ 

and: 

E 



('.{'?"}.)-^+i:»,('-{'r'},). 



Mjit.y) Mj{t.y) 



j^S n=l j^X n=l 



2.5.2 Fire regime. 

The previous equations hold whenever the neuron is quiescent, i.e. whenever membrane potential is 
smaller than a threshold ^ > 0, usually depending on time (to account for characteristics such as refractory 
period of the neuron) and on the neuronal state. Here, however, we assume that is a constant (in fact 
without loss of generality (Gerstner & Kistler, 2002a)). When the membrane potential exceeds the 
threshold value, the neuron "fires" (emission of an action potential or "spike"). The spike shape depends 
on the model. In the present case, the membrane potential is reset instantaneously to a fixed value 
Vreset — E^, Corresponding to the value of the membrane potential when the neuron is at rest. For 
simplicity we set Vreset = without loss of generality. This modeling of the spike introduces a natural 
notion of spike time, but the price to pay is to introduce a discontinuity in dynamics. Moreover, since 
spike is instantaneous the set of possible spike times occurring within time interval is uncountable in the 
continuous case. Introducing a minimal time discretisation at scale 6 removes this pathology. 



2.5.3 Time discretisation. 

Assuming that spike times are only known within a precision 6 > (Cessac & Vieville, 2008) one shows 
that the dynamics of membrane potential, discretized at the time scale 6 writes: 

Vk{t + 6)= pk{t, t + 6, Mo^J Vk{t) + Jk{t, Ho^J (4) 

where: 

Jk{t,Ha.t) = ^k{s,[^]o,t)Pk{s,t + S,[iJ]Q^^)ds, 

9k{t,MoJ = — +T,j=iGk3T,n=i a^{s-t)'), 

where Mj{t,uj) is the number of spikes emitted by neuron j up to time t, in the raster plot oj. In the 
sequel we assume that a^{u) = whenever |u| > tm, i-e. gk{t, [w]o t) depends on past firing times over 
a finite time horizon tm- We also set (5 = 1. 



8 



2.5.4 Model I and model II. 



A step further, two additional simplifications can be introduced: 

1. Assuming that pk{t,t + d, [u]q j) = p is almost constant, which is equivalent to considering "current" 

synapses instead of "conductance" synapses, i.e. neglect the dependency of i^*^"^ with respect to 

Vk-, 

2. Considering a much simpler form for Jk{t, [<^]o t)' where each connected neuron simply increments 
the membrane potential during its firing state. This is equivalent to considering the post-synaptic 
profiles in the previous equations as instantaneous step-wise profiles. 

Considering these assumptions leads the following dynamics of membrane potential: 

F^,,(V) = pVi{i- z[Vi\) + J2 WijZ[Vj] + /r*; i = l...N, (5) 

where F^_j is the i-th component of F^, Z{x) = 1 if x > 6 and otherwise, both the integrate and 
firing regime being integrated in this equation. It turns out that this corresponds exactly to the time 
discretisation of the standard integrate and fire neuron model, which as discussed in e.g.,(Izhikevich, 
2003) provides a rough but realistic approximation of biological neurons behaviors. This model is called 
model I in the sequel whereas the model based on (4) is called model II. 



2.5.5 Genericity results for models I and II. 

The map F^ in models I and II is locally contracting (Cessac, 2008; Cessac & Vieville, 2008), but it is 
not smooth due to the sharp threshold in neurons firing definition. The singularity set where F^ is not 
continuous is: 

S^{\ eM\3i = l...N, such that Vi=e}. 

This is the set of membrane potential vectors such that at least one of the neurons has a membrane 
potential exactly equal to the threshold. Because of this, dynamics exhibit sensitivity to perturbations 
for orbits which approach too close the singularity set. 
Now, call 

d(f2,<S) = inf inf min YVAt) - 61 (6) 
Then, the following theorem holds (Cessac, 2008; Cessac & Vieville, 2008). 

Theorem 1 For a generic (in a metric and topological sense) set of parameters 7, d{^l,S) > 0. Conse- 
quently, 

1. Q is composed of finitely many periodic orbits with a finite period, Thus, attractors are generically 
stable period orbits. 

2. There is a one-to-one correspondence between membrane potential trajectories and raster plots, 

except for a negligible set of A4 . 

3. There is a finite Markov partition. 
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Remarks 



1. Note however that, depending on parameters (synaptic weights, external current), some periods can 
be quite large (well beyond any accessible computational time) Also, the number of stable periodic 
orbits can grow exponentially with the number of neurons. 

2. Result 2 means that spike trains provide a symbolic coding for the dynamics. 

3. See section 2.6.1 for the implications of result 3. 

These models constitute therefore nice examples where one has a good control on dynamics and where 
one can apply the machinery of thermodynamic formalism (see Appendix). Note that since dynamics is 
eventiially periodic, one may figure out that there is nothing non trivial to say from the respect of spike 
train statistics. The key point is that period are practically so large that there is any hope to see the 
periodicity in real experiments. Thus, one may do "as if" the system were "chaotic". This is a fortiori 
the case if one superimposes upon the external current a small amount of noise to the external current 

j(ea;t) 

Also, though dynamics of model I and II may look rather trivial compared to what is expected 

from biological neural networks, it might be that such models are able to approximate trajectories of a 
real neural network, for suitable values of 7 (e.g. after a suitable adaptation of the synaptic weights) 
and provided N, the number of neurons, is sufficiently large. This type of properties are currently 
sought by computational neuroscientist with interesting indications that 'IF models are good enough" 
to approximate biological neurons spike trains (Jolivet, Ranch, Lescher, & Gerstner, 2006). See (Rostro- 
Gonzalez, Cessac, Vasquez, & Vieville, 2009) for a recent illustration of this. 

2.6 Spikes dynamics and statistics. 
2.6.1 Grammar. 

Though dynamics (1) produces raster plots, it is important to remark that it is not able to produce any 
possible sequence of spiking patterns. This fundamental fact is most often neglected in computational 
neuroscience literature and leads, e.g. to severe overestimation of the system's entropy. Also, it plays a 
central role in determining spike train statistics. 

There are therefore allowed and forbidden sequences depending on conditions involving the detailed 
form of F-y and on the parameters 7 (synaptic weights and external current). Let u},u}' be two spiking 
patterns. The transition a; ^ is legal or admissible if there exists a neural state X such that neurons fire 
according to the firing pattern u and, at the next time, according to the firing pattern w'. Equivalently, 
F-y('Pw) nPu)' 0- An admissible transition must satisfy compatibility conditions depending on synaptic 
weights and currents. 

As an example, for model I, compatibility conditions write, for alH = 1 . . . A'^: 

(a) i is such that = 1 andw^ = 1 <^ X^^Li ^ij^j + ^i^^ ^ ^ 

(b) i is such that = 1 and = <^ XljLi WijiVj + If'"* < 

(c) i is such that = andw^ = 1 <^ 3X e V^, such that, pXi + J^f^^ WijCJj + 7?^* > 6 
{d) i is such that = andw^ = 3X e such that, pXi + Y^f^^ WijUJj + If''* < 6 

Note that, while the two first conditions only depend on the partition element Vu,, the two last ones 
depend on the point X e V^- Basically, this means that the natural partition is not a Markov partition. 
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The idea is then to find a refinement of the natural partition such that allowed transitions depend only 
on the partition-elements. 

A nice situation occurs when the refined partition is finite. Equivalently, compatibility conditions 
can be obtained by a finite set of rules or a finite grammar. This is possible if there exists some finite 
integer r with the following property: Construct blocks of spiking patterns a;(0) . . . w(r — 1) of width r. 
There are at most 2^*^ such possible blocks (and in general quite a bit less admissible blocks). Label 
each of these blocks with a symbol a € A, where A is called an alphabet. Each block corresponds to a 
connected domain n^~QF~*[7-'„(s)] C Ai and to a cylinder set [w]q^_;^ C T,^. Define a transition matrix 
G~f : A X A — > {0, 1}, depending on 7, with entries gaf), such that 5^/3 = 1 if the transition /3 — > a 
is admissible, and otherwise. To alleviate the notations in the next equation write a{t) = [uj\^j^^_-^, 
Mt > 0. If 

= {w I 9a{t+l)a{t) = l,Vi > 0} , 

then all admissible raster plots arc obtained via the finite transition matrix G~f which provides the 
grammar of spiking patterns sequences. 

In models I, II such a finite grammar exists for the (generic) set of parameters 7 such that d{Q., S) > 0. 
Moreover, there arc open domains in the space of parameters where the grammar is fixed and is not 
affected by small variations of synaptic weights or current. 

For more general models, it might be that there is no finite grammar to describe all admissible raster 
plots. When dealing with experiments, one has anyway only access to finite raster plots and the gram- 
mar extracted from empirical observations has a finite number of rules (see (Collet, Calves, & Lopez, 
1995; Chazottes, Floriani, & Lima, 1998; Chazottes, 1999) for nice applications of this idea in the field of 
turbulence). This empirical grammar is compatible with the dynamics at least for the time of observation. 

The grammar depends on 7. Assume now that we are continuously varying 7. This corresponds 

to following some path in the space of control parameters. Then, this path generically crosses domains 
where grammar is fixed, while at the boundary of these domains, there is a grammar modification, and 
thus a change in the set of admissible raster plots that dynamics is able to produce. Synaptic adaptation, 
as described in section 3, corresponds precisely to this situation. Thus, we expect that the neural network 
exhibits "regular" periods of adaptation where spike trains before and after synaptic changes, correspond 
to the same grammar. Between these regular periods, sharp changes in raster plots structure are expected, 
corresponding somehow to displaying new transitions and new rules in the spike code. Actually, the ability 
of neural networks to display, after adaptation, a specific subset of admissible sequences, provided by a 
specific grammar, is an important issue of this paper. 

2.6.2 Spike responses of neurons. 

Neurons respond to excitations or stimuli by finite sequences of spikes. In model I and II a stimulus is 
typically the external current but more general forms (spike trains coming from external neurons) could 
be considered as well. Thus, the dynamical response i? of a neuronal network to a stimuli S (which can be 
applied to several neurons in the network), is a sequence u}{t) . . .u}{t + n) of spiking patterns. "Reading 
the neural code" means that one seeks a correspondence between responses and stimuli. However, the 
spike response does not only depend on the stimulus, but also on the network dynamics and therefore 
fiuctuates randomly (note that this apparent randomness is provided by the dynamical evolution and does 
not require the invocation of an exogenous noise). Thus, the spike response is sought as a conditional 
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probability P(R\S) (Rickc ct al., 1996). Reading the code consists in inferring P{S\R) e.g. via Bayesian 
approaches, providing a loose dictionary where the observation of a fixed spikes sequences R does not 
provide a unique possible stimulus, but a set of stimuli, with different probabilities. Having models for 
conditional probabilities P{R\S) is therefore of central importance. For this, we need a good notion of 
statistics. 

2.6.3 Performing statistics. 

These statistics can be obtained in two different ways. Either one repeats a large number of experi- 
ments, submitting the system to the same stimulus S, and computes P{R\S) by an average over these 
experiments. This approach relies on the assumption that the system has the same statistical properties 
during the whole set of experiments (i.e. the system has not evolved, adapted or undergone bifurcations 
meanwhile) . In our setting this amounts to assuming that the parameters 7 have not changed. 

Or, one performs a time average. For example, to compute P{R\S), one counts the number of times 
n{R, T, Lo) when the finite sequence of spiking patterns R, appears in a spike train lu of length T, when 
the network is submitted to a stimulus S. Then, the probability P{R\S) is estimated by: 

P{R\S)=\im . 

I — *oo 1 

This approach implicitly assumes that the system is in a stationary state. 

The empirical approach is often "in-between" . One fixes a time window of length T to compute the 
time average and then performs an average over a finite number J\f of experiments corresponding to 
selecting different initial conditions. In any case the implicit assumptions are essentially impossible to 
control in real (biological) experiments, and difficult to prove in models. So, they are basically used as 
"working" assumptions. 

To summarize, one observes, from M repetitions of the same experiment, Af raster plots Wm,m = 
1 . . .7\A on a finite time horizon of length T. From this, one computes experimental averages allowing to 
estimate P{R\S) or, more generally, to estimate the average value, (</)), of some prescribed observable 
These averages are estimated by : 

m=l t=l 

Typical examples of such observables are 0(a;) = Wi(0) in which case {(f)) is the firing rate of neuron i\ 
(f){uj) = uji{Q)u:j{Q) then {(()) measures the probability of spike coincidence for neuron j and i; ^{u) = 
u}i(T)u}j{0) then {(j)) measures the probability of the event "neuron j fires and neuron i fires t time step 
later" (or sooner according to the sign of r). In the same way P{R\S) is the average of the indicatrix 
function Xfl('^) = 1 if w S .R and otherwise. Note that in (7) we have used the shift for the time 
evolution of the raster plot. This notation is more compact and more adapted to the next developments 
than the classical formula, reading, e.g., for firing rates 

This estimation depends on T and J^. However, one expects that, as J^,T 00, the empirical 
average converges to the theoretical average (ci), as stated e.g. from the law of large numbers. 

Unfortunately, one usually does not have access to these limits, and one is lead to extrapolate theoretical 
averages from empirical estimations. The main difficulty is that these observed raster plots are produced 
by an underlying dynamics which is usually not explicitly known (as it is the case in experiments) or 
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impossible to fully characterize (as it is the case in most large dimensional neural networks models) . Even 
for models I and II, where one has a full characterization of generic orbits, a numerical generation of raster 
plots can produce periodic orbits whose period is out of reach. Thus, one is constrained to propose ad hoc 
statistical models. This amounts to assuming that the underlying dynamics satisfies specific constraints. 

2.6.4 Inferring statistics from an hidden dynamics. 

We follow this track, proposing a generic method to construct statistical models from a prescribed set 
of observables. For this, we shall assume that the observed raster plots are generated by a uniformly 
hyperbolic dynamical system, where spike trains constitute a symbolic coding for dynamics (see the 
appendix for more details). This a technical assumption allowing us to develop the thermodynamic 
formalism on a safe mathematical ground. Basically, we assume that the observed raster plots can be 
generated by a dynamical system which is chaotic with exponential correlation decay, and that a finite 
Markov partition exists, obtained by a refinement of the natural partition. 

This is obviously a questionable assumption, but let us give a few arguments in its favor. First, it 
has been argued and numerically checked in (Rostro- Gonzalez et al., 2009) that finite sequence of spiking 
patterns, produced by an hidden neural network, including data coming from biological experiments, can 
be exactly reproduced by a gIF model of type II, by adding hidden neurons and noise to the dynamics. 
This suggests that gIF models are somehow "dense" in the space of spiking neural networks dynamical 
systems, meaning that any finite piece of trajectory from a spiking neural network can be approached by 
a gIF model trajectory. Second, adding noise to model I, II makes them uniformly hyperbolic^, though 
not continuous. So, we expect the theory developed below to apply to model I, II. Now, dealing with 
spike train coming from a neural network whose mathematical properties are unknown (this is especially 
the case with biological neural networks), and whose statistical properties are sought, the idea is to do 
as if these data were generated by a dynamical system like model I or II. 

As a matter of fact, the choice of a statistical model always relies on assumptions. Here we make an 
attempt to formulate these assumptions in a compact way with the widest range of application. These 
assumptions are compatible with the statistical models commonly used in the literature like Poisson 
models or Ising like models a la Schneidman and collaborators (Schneidman et al., 2006), but lead also 
us to propose more general forms of statistics. Moreover, our approach incorporates additional elements 
such as the consideration of neurons dynamics, and the grammar. This last issue is, according to us, 
fundamental, and, to the best of our knowledge, has never been considered before in this field. Finally, 
this postulate allows us to propose algorithms that can be applied to real data with an posteriori check 
of the initial hypotheses (Cessac, Vasquez, & Vieville, 2009). 

On this basis we propose the following definition. Fix a set (f)i, I ~ 1...K, of observables, i.e. 
functions Y,^ IR which associate real numbers to sequences of spiking patterns. Assume that the 

empirical average (7) of these functions has been computed, for a finite T and Af, and that (/>j ' = C(. 
A statistical model is a probability distribution v on the set of raster plots such that: 

^Contraction is uniform in model I. In model II it can be bounded by a constant strictly lower that 1. Introducing noise 
amounts to adding, for each neuron, an unstable direction where randomness is generated by a chaotic one-dimcnsional 
system. One forms in this way a skew product where the unstable fiber is the random generator and the stable one 
corresponds to the contracting dynamics of membrane potentials (eq. (4), or (5)) Because of the discontinuity of the map, 
the main difficulty is to show that, in this extended system, generic points have a local stable manifold of sufficiently large 
diameter (Cessac & Fernandez, in preparation). See (Blanchard, Cessac, & Krueger, 2000; Cessac, Blanchard, Krueger, &c 
Meunier, 2004) for an application of this strategy in the framework of Self- Organized Criticality. 
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1. 1^(5]-),) = 1, i.e. the set of non admissible raster plots has a zero i/-probability. 

2. v is ergodic for the left-shift (see section 6.1 in the appendix for a definition). 

3. For all I = 1 . . .K, = Ci, i.e., v is compatible with the empirical averages. 

Note that item 2 amounts to assuming that statistics are invariant under time translation. On practical 
grounds, this hypothesis can be relaxed using sliding time windows. This issue is discussed in more details 
in (Cessac et al., 2009). Note also that ly depends on the parameters 7. 

Assuming that v is ergodic has the advantage that one does not have to average both over experiments 
and time. It is sufficient to focus on time average for a single raster plot, via the time-empirical average: 

t=i 

defined within more details in the appendix, section 6.2. 
Item 3 can be generalized as follows. 

3'. d{TT!^\iy) ^ 0, as T ^ 00, 

where d{^, u) is the relative entropy or KuUback-Leibler divergence between two measures ^,, v (see eq. 
(54) in the appendix.). 

Dealing with real or numerical data, it is obviously not possible to extrapolate to T — > 00, but the main 

idea here is to minimize the "distance" d{TXij , v) between the empirical measure, coming from the data, 
and the statistical model. Especially, if several models can be proposed then, the "best" one minimizes 
the Kullback-Leibler divergence (see th. 2 in the Appendix). The main advantage is that d{'K'^\v) can 
be numerically estimated using the thermodynamic formalism. Note however that this approach may fail 
at phase transition points where d(/x, v) = does not necessarily imply jx = v (Chazottes, 1999). Phase 
transition can be numerically detected from empirical data (Comets, 1997). 

2.6.5 Gibbs measures as statistical models. 

Variational principle. Statistical physics naturally proposes a canonical way to construct a statistical 
model: "Maximizing the entropy under the constraints v{(l>i) = Ci, I = 1 . . .K^\ (see (Jaynes, 1957) for 
a beautiful and deep presentation of what became, since then, a "folklore" result). In the context of 
thermodynamic formalism this amounts to solving the following variational principle (see eq. (43) in the 
Appendix) . 

P[V]= sup {h[u\ + v[il,]), 

where the set of invariant measures for a and h the Kolomogorov-Sinai entropy or entropy 

rate (see Appendix for more details). The "potential" if) is given by V = ^i4>i where the A;'s are 
adjustable Lagrange multipliers. A measure which realizes the supremum, i.e. 

P[il)\ = h [v^] + [ip] , 

is called, in this context, an "equilibrium state". The function P['ip] is called the "topological pressure". 
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Gibbs states. Uniformly hyperbolic dynamical systems have equilibrium states. Moreover, in this case, 
equilibrium states are Gibbs states (and vice- versa). A Gibbs state, or Gibbs measure, is a probability 
measure such that, one can find some constants ci, C2 with < ci < 1 < C2 such that for all n > 1 and 
for all CO gT,: 

{uj e [t^]o,„_i) 

(Cf eq. (51) in the Appendix), where S^^^ipiu) = X^/Lo^ ''Pi'^y^)- Basically, this means that the probabil- 
ity that a raster plot starts with the bloc [w]g behaves like "^iH--'^'^ V'l^O) ^ Qj^g recognizes the classical 
Gibbs form where space translation in lattice system is replaced by time translation (shift cr^) and where 
the normalization factor Z,, is the partition function. Note that P ['«/'] = limsup^^,^ ^log Zn, so that 
P is the formal analog of a thermodynamic potential (like free energy). 

Topological pressure. The topological pressure is a convex function. Moreover, this is the generating 
function for the cumulants of the Gibbs distribution. Especially, the Lagrange multipliers A; can be tuned 
to a value A* such that [cpi] = Ci, (item 3) using: 



dP 



dXi 



= Ci. (8) 



It expresses that the Gibbs state is given the tangent of the pressure at A;* (Keller, 1998). Note that we 
have hero assumed that the topological pressure is differentiable, namely that we are away from a phase 
transition. Higher order moments are obtained in the same way, especially second order moments related 
to the central limit theorem obeyed by Gibbs distributions (Bowen, 1975, 2008; Keller, 1998). It is also 
possible to obtain averages of more complex functions than moments (Ji, 1989). The topological pressure 
is obtained via the spectrum of the RucUe-Perron-Frobenius operator and can be calculated numerically 
when tp has a finite (and small enough) range (see appendix for more details). 

Validating a Gibbs statistical model. The KuUback-Leibler divergence provides some notion of 
"distance" between two measures. For fj, an invariant measure and a Gibbs measure with a potential 
■0, both defined on the same set of sequences S, one has (see eq. (55) in the appendix): 



d {jjL, v^) = P[ip]- j ipdii - h{iJ,). 



Though TT^^ is not invariant, this suggests to use this relation to compare different statistical models (i.e. 
when several choices of observables c^; are possible) by choosing the one which minimizes the quantity: 

d{nP,u^) = P [V] - ^^JH'iP) - h{nP). (9) 

(see th. 2 in the appendix). The advantage is that this quantity can be numerically estimated (Cessac 
et al., 2009). 

Probability of spiking patterns blocs. In this context, the probability of a spiking pattern block 
R = [ti^lo of length n corresponding to the response i? to a stimuli S "behaves like" (in the sense of 
eq. (51)): 
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1 



r K n-l 



P [R\S] = v[uj& R\S] 



Zn[X*{S)] 



exp Y.\*,{S)Y,<t>i{<^\w) 



(10) 



J=l t=0 



where the conditioning is made exphcit in the dependence of A* in the stimulus S. (Typically, referring 
to model I and II, S is an external current and is incorporated in 7.) This equation holds as well when 
S' = (no stimulus). Obviously, for two different stimuli the probability P{R\S) may drastically change. 
Indeed, different stimuli imply different parameters 7 thus a different dynamics and different raster plots, 
with different statistical weights on a specific bloc R. The grammar itself can also change, in which case 
R may be forbidden for a stimulus and allowed for another one. 

2.7 Examples. 

2.7.1 Canonical examples. 

Firing rates. If 0;(w) = <jJi{Q), then n^^cjji) = ri is the average firing rate of neuron / within the time 
period T. Then, the corresponding statistical model is a Bernoulli distribution where neuron I has a 

probability r; to fire at a given time. The probability that neuron I fires k times within a time delay 
n is a binomial distribution and the inter-spike interval is Poisson distributed (Gerstner & Kistler, 2002a). 

Spikes coincidence. If (l)i{uj) = (f>(ij-){uj) = LUi{0) LLij{0) where, here, the index I is an enumeration for 
all (non-ordered) pairs (i, j), then the corresponding statistical models has the form of an Ising model, as 
discussed by Schneidman and collaborators in (Schneidman et al., 2006; Tkacik et al., 2006). As shown 
by these authors in experiments on the salamander retina, the probability of spike blocs estimated from 
the "Ising" statistical model fits quite better to empirical date than the classical Poisson model. 

Enlarged spikes coincidence. As a generalization one may consider the probability of co-occurrence 
of spikes from neuron i and j within some time interval r. The corresponding functions are 4>i{lu) = 
cOi{0)u}j{T) and the probability of a spike bloc R writes: 



Further generalizations are considered below. 
2.8 Conclusion. 

The main conclusion of this section is that Gibbs measures constitute optimal (in the sense of entropy 

maximization) statistical models whenever a set of observable has been prescribed. It also allows to select 
a model between several choices by minimizing the Kullback-Leibler divergence between the empirical 
measure and the statistical models. 

In all the examples presented above the statistical model is determined by the choice of an a priori 
form for the potential. In the next section, we explicitly compute the potential in neural networks with 
synaptic adaptation. 



1 



n-l 



P[R\S] = 




exp 
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3 Synaptic plasticity. 



3.1 Synapses update as an integration over spikes trains. 

Synaptic plasticity corresponds to the evolution of synaptic efficacy (synaptic weights). More precisely, 

in our notations (cq. (3)), Wij essentially provides the maximal amplitude of the post-synaptic potential 
induced, at the synapse linking j to i, when neuron j fires a spike. Synaptic weights evolve in time 
according to the spikes emitted by the pre- and post- synaptic neuron. In other words, the variation of 
Wij at time t is a function of the spiking sequences of neurons i and j from time t — Ts to time t, where 
Tg is time scale characterizing the width of the spike trains influencing the synaptic change. In most 
examples the synapse update writes: 

6Wij{t + l)=g (Wij{t), [uji],_^^^, , , t > T,, (11) 

with f = [uJiit — Ts) . . . u}i{t)]. Thus, typically, synaptic adaptation results from an integration of 

spikes over the time scale Tg. 

3.2 Spikes responses of synapses. 

The synaptic variation dWij is the integrated response of the synapse from neuron j to neuron i when 
neuron j sends a spike sequence [tjjjj y ^ and neuron i fires according to [tJiJ^ ^ ^. This response is not 
a deterministic function, while g is. Thus (11) is an approximation. As a matter of fact, the explicit 
form of g is usually derived from phenomenological considerations as well as experimental results where 
synaptic changes can be induced by specific simulations conditions, defined through the firing frequency 
of pre- and post-synaptic neurons (Bliss & Gardner-Medwin, 1973; Dudek & Bear, 1993), the membrane 
potential of the post-synaptic neuron (Artola, Brocher, & Singer, 1990), or spike timing (Levy & Stewart, 
1983; Markram et al., 1997; Bi & Poo, 2001) (sec (Malenka & NicoU, 1999) for a review). Thus, these 
results are usually based on a repetition of experiments involving the excitation of pre- and post-synaptic 
neurons by specific spike trains. The phenomenological plasticity rules derived from these experiments 
are therefore of statistical nature. Namely, they do not tell us what will be the exact changes induced on 
synapses when this or this spike train is applied to pre- and post-synaptic neuron. Instead, they provide 
us the average synaptic change. Thus, the function g{Wij,[u)i\^_rp^ ^ ,[u!j]^_rp ^) in (11) is typically a 
statistical average of the synaptic response when the spike train of neuron j (resp. i) is j (resp. 

['^i]t_T t)' ^iid the actual synaptic weight value is Wij. 

In this paper we investigate the effects of those synaptic plasticity rules when the characteristic time 
scale Tg is quite a bit larger than the time scale of evolution of the neiirons. Namely, we consider slow 
adaptation rules. In this situation the synaptic weights update can be written as a function of the 
empirical average (42). We therefore consider adaptation rules of form: 

9 {Wij, [uJi],_T^^, , k],_T3,t) = IMWij, .)] • (12) 
where e is a parameter that will be typically small. (j)ij{Wij,oj) is a function that we now make explicit. 

Following (Gerstner & Kistler, 2002a) canonical form of adaptation rules use an expansion in singlet, 

pairs, triplets etc of spikes. Formally, one may write a generic form for these rules. Fix Tg < oo 
a positive integer. Let £ be the finite set of ordered lists L in {—Tg,—Tg + 1,...,0} with the form 
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L = {ti,t2, ■ ■ ■ , ti) with —Ts < ti < t2 < ■ ■ ■ < tL < 0- For LeC, l<i<N,we call an i-monomial a 
function rrii^Li'jj) = 'jJi{ti) ■ ■ - oJiith)- We call an (i, j) polynomial a function: 

<j)ij{Wij,uj) = hijL^L2{Wij)mi,Lr{oj)mj,L2{(^), (13) 

where hij^-^L^i^ij) smooth functions of Wij. They can be constant, linear functions of Wij ("multi- 
plicative" rules) or nonlinear functions allowing for example to constrain Wjj within bounded values (e.g. 
"hard bound" or "soft bounds" rules). The form (13) is the most general form of synaptic adaptation 
rules considered in this paper, while g has the explicit form: 



K 

(14) 



fc,;=o -Ts < ti < ■ ■ ■ < tfc < 

-Ts < SI < • ■ ■ < S| < 



Though explicit, this formulation is heavy, and we use instead the form (12), (13). 

3.3 Examples of adaptation rule instantiation. 

Hebbian learning uses firing rates. A typical example corresponds to gij{Wij, [uji]f_rp^ ^ , [i^j]f._rp J = 

^T; S!i,s2=t-T3('^i(«i) - ri{si)){i^jis2) - rj{s2) (correlation rule) where ri{t) = ^ Yfs=t-T, <^i(*) is the 
frequency rate of neuron i in the raster plot u), computed in the time windows [t — Ts,t]. 

Spike- Time Dependent Plasticity as derived from Bi and Poo (Bi & Poo, 2001) provides the average 
amount of synaptic variation given the delay between the pre- and post-synaptic spike. Thus, "classical" 
STDP writes (Gerstner & Kistler, 2002a; Izhikevich & Desai, 2003): 

t 

g (Wij, [uJi]t-Ts,t ' [<^j]t-Ts,t) = Y X] -^(^1 ~ ^2) Wi(si) Wj(S2) (15) 

* Si ,S2=t— Tg 

with: 



/(^) = <^ 



A_e"- , a; < 0, ^_ < 0; 

A+e"^, a;>0, A+ > 0; (16) 
0, a; = 0; 



where the shape of / has been obtained from statistical extrapolations of experimental data. Hence 
STDP is based on a second order statistics (spikes correlations). There is, in this case, an evident time 
scale Tg = max (r_,T+), beyond which / is essentially zero. 

"Nearest neighbors" STDP (according to the terminology of(Izhikevich & Desai, 2003)) writes: 

t 
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with Tj{s) = mmt^^.^t)=i \t - si- 
Generalized STDP As a last example, (Gerstner & Kistler, 2002a) propose a rule which corresponds 
to: 



s=t-Ts s=t-Ts si,S2=t-T, 

(18)- 

We capture thus the main standard synaptic adaptation rules with (13). 



3.4 Coupled dynamics. 

We consider now the following coupled dynamics. Neurons are evolving according to (1). We focus 

here on slow synapses dynamics. Namely, synaptic weights arc constant for T >Ta consecutive dynamics 
steps, where T is large. This defines an "adaptation epoch" . At the end of the adaptation epoch, synaptic 
weights are updated according to (11). This has the consequence of modifying neurons dynamics and 
possibly spike trains. The weights arc then updated and a new adaptation epoch begins. Wc denote by 
t the update index of neuron states (neuron dynamics) inside an adaptation epoch, while r indicates the 
update index of synaptic weights (synaptic plasticity). Call X('^^(t) the state of the neurons at time t 
within the adaptation epoch t. Let VK^j be the synaptic weights from neuron j to neuron at i in the r-th 
adaptation epoch. At the end of each adaptation epoch, the neuron dynamics time indexes are reset, i.e. 
a;^^'''^''(0) = xY\T),i = 1 . . .N. The coupled dynamics writes: 

r XM(t + l) = F^(.,(XM(i)) 

Recall that 7 = (>V,l('=^*)) (see section 2.1) and 7^^) is the set of parameters at adaptation epoch 
r. In the present setting the external current is kept fixed and only synaptic weights are evolving. 

Basically, is used as an external stimulus. 



3.5 Statistical effects of synaptic plasticity. 

Synaptic plasticity has several prominent effects. First, modifying the synaptic weights has an action on 
the dynamics resulting, either in smooth changes, or in sharp changes. In the first case, corresponding 
to parameters variations in a domain where the dynamical system is structurally stable, small variations 
of the synaptic weights induce smooth variations on the w-limit set structure (e.g. points are slightly 
moved) and in the statistics of orbits. The grammar is not changed. On the opposite, when crossing 
bifurcations manifolds, the slightest change in one synaptic weight value results typically in a drastic 
reorganization of the w-limit set structure where attractors and their attraction basin are modified (see 
Fig. 2 and (Dauce et al., 1998; Siri et al., 2007) for an illustration of this in the case of frequency rates 
neural networks with Hcbbian learning) . But this can also change the grammar and the set of admissible 
raster plots. Some forbidden transitions become allowed, some allowed transitions become forbidden. 
Finally, the spikes train statistics are also modified. Typically, the time-empirical average of the raster 
plot changes (tt^^^j "^iT^+i) ) ^^'^ corresponding statistical model also evolves. 
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But synaptic adaptation is not a mere variation of synaptic weights. Indeed, synaptic weights are 
associated with the coupled dynamics (19), meaning that synaptic changes depends on neurons dynamics, 
itself depending on parameters. Thus, a synaptic adaptation process corresponds to following a path in 
the space of parameters 7; this path is not determined a priori but evolves according to neurons dynamics. 
Though this path can belong to a unique domain of structural stability, this is not the case in general. At 
some point during the adaptation, this path crosses a bifurcation manifold inducing sharp changes in the 
dynamics and in the grammar'"^. Hence, after several changes of this type one can end up with a system 
displaying raster plots with a structure rather different from the initial situation. These changes depend 
obviously upon the detailed form of neuron dynamics (1) and upon the synaptic update mechanism (11); 
they arc also conditioned by parameters such as stimuli. We now Analise these effects in the light of 
thermodynamic formalism and Gibbs distributions. 



3.5.1 Static synaptic weights. 

Let us first consider the situation where SW = 0, corresponding to synaptic weights that do not evolve. 
Typically, this is the case if synaptic weights matrix converges to an asymptotic value W*. From eq. (12) 
this corresponds to : 

[-^i.] = 0, Vi,ie{i,...7V}. (20) 

This imposes a condition on the average value of i/iy . Therefore, from section 2.6.5 this imposes that the 
statistical model is a Gibbs measure v with a potential of form: 

^* = $ + A*.0, (21) 

where V* = {^tj)^^^^,, = i<Pij)Z=„ A* = (A*,);^.^^ and A*.0 = Efj^iKj^H- The potential $ in 
(21) is such that ^(uj) = if is admissible and ^{uj) = —00 if it is forbidden, so that forbidden raster 
plots have zero probability. This is a way to include the grammar in the potential (see appendix). The 
statistical parameters X*j, are given by eq. (8) in section 2.6.5, and, making explicit (eq. (13)): 



dP [ij] 



dXij 



^ = Vi,'{(t>i3) = hijL,L2{W*j)iy^' [nii^L.mj^L^] = 0. (22) 

•^='^* Li,L2e£ 

Since rUi^Lirnj^L^ are monomials, this equation thus imposes constraints on the probability 

of spikes n-uplets LVi{ti) . . . Wi(iL)wj(si) . . . uij{sL)- Note that condition (22) corresponds to an extremum 
for the topological pressure as a function of A. Note also that the variational formulation of Gibbs (equi- 
librium) state writes, in this case P [xp*] = sup^g^t^nv) h [v] since z/(V') = 0. Hence, the corresponding 
Gibbs measure has maximal entropy. 

Let us emphasize what we have obtained. The statistical model that fits with the condition (20) in 
section 2.6.5 is a Gibbs distribution such that the probability of a spin block R of length n is given by : 



^ There is here an analogy with phase transitions in statistical physics (Beck & Schloegl, 1995). Detecting and character- 
izing these phase transitions from empirical data is possible provided that there axe only a few control parameters (Comets, 
1997). 
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P[R\S] = 



Zn[X*{S)] 



exp 



CO 



where ip* depends on 5 via the statistical parameters A* and via $ (grammar). 

When the situation dW = corresponds to the asymptotic state for a synaptic adaptation process, 
this potential provides us the form of the statistical model after adaptation, and integrates all past changes 
in the synaptic weights. We now discuss this process within details. 

3.6 Variational formulation of synaptic plasticity. 
3.6.1 Synaptic weights update and related potentials. 

Let us now formalize the coupled evolution (19) in the context of thermodynamic formalism. The main 

(T) 

idea is to make the assumption that at each adaptation step, tt can be approximated by a Gibbs 



measure j^^(t) with potential i/''-^-' and topological pressure P 
synaptic adaptation writes : 



(r) 



. In this case, when T is large, 



5W^J^ = eu. 



The synaptic update results in a change of parameters 7, 7 



(^+1) 



(23) 

7^'^) + (57^^'. This induces a 
Ml +5pir)_ We 



variation of the potential tp^'^'^^^ = tp^'^^ + Stp^'^^ and of the pressure P tp^'^'^^^ = p ip 
now distinguish two situations both arising when synaptic weights evolve. 

3.6.2 Smooth variations. 

Let us first assume that these variations are smooth, i.e. one stays inside a domain where dynamics is 
structurally stable and the grammar G^m is not modified. We assume moreover that the topological 
pressure is difFerentiable with respect to the variation ^7^'^' . 
Let us define: 



:p(^)(w) = p fv^^' + (w - w(^').0(w(^)) 



(24) 



so that ^^\yV^'^^) = 0. Note that J^^\w) is convex, due to the convexity of the topological pressure (eq. 
(48) in appendix). 

Then, using eq. (49) in appendix, the adaptation rule (23) can be written in the form: 



^H;(^)=eVw=ww4"^(W). 
Since, in this section, pressure P is assumed to be smooth, one has, using (49), (50): 



e P 



+(5W(^\</)(>V(^)) -P V^^-* ) =^>V('"^ 
where k^^^ is the tensor with entries: 

+00 



I...N, 



(25) 
(26) 

(27) 



t=i 
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and C. 



>l.Jlf>2>J2 



it) 



o a 



,(.-r)Vi2,j2 \ - ^^(^) [^^iiji] [^i2,j2] is the correlation function of 

4>ii,ji- (t)i.2j2 foi' the measure v^(t). Using the explicit form (13) of (jy-ij one can see that k*^"^' is a sum of 
time correlations between uplets of spikes. This is a version of the fluctuation-dissipation theorem where 
the response to a smooth variation of the potential t/'^'"' is given in terms of a series involving the time 
correlations of the perturbation (Ruelle, 1969, 1999). This series converges provided that dynamics is 
uniformly hyperbolic. 



For sufficiently small e, the matrix I + §k^^' is positive and: 



P 



> p 



It follows that the variation S^'^^J^^, = J^^J+^\w^''+^^) - A''\w^^+'^'>) is given by: 



(r) 



-P 



.j>V(t) _ o(5>v(^)^) 



This variation is therefore negative when e is sufficiently small. 



(28) 



We come therefore to the following important conclusion. The adaptation rule (12) is a gradient 
system where the function J^^^ decreases when iterating synaptic adaptation rules. Were the transition 
r ^ T + 1 to be smooth for all r, would J^^'' reach a minimum^ at some W* as r — > oo. Such a minimum 
corresponds to Ww^^^ = 0, thus to 5W = according to eq. (25). Hence, this minimum corresponds 
to a static distribution for the synaptic weights. Therefore, according to section 3.5.1 the potential ^p^'^'^ 
would converge to the potential (21) as r ^ +00. However, we cannot expect these transitions to be 
smooth for all r. 



3.6.3 Singular variations. 

As we saw, during the synaptic adaptation process, the corresponding path in the space of parameters 
usually crosses bifurcations manifold inducing sharp changes in the dynamics. At those points pressure 
may not be smooth corresponding to a phase transition. These changes arc not necessarily easy to detect 
numerically, though algorithms allowing to detect phase transitions from finite samples exist, based on 
rigorous results (Comets, 1997). Indeed, searching "at blind" for all possible kind of phase transitions in 
so large dimensional dynamical systems, without any idea of what can happen, seems to be desperate, 
even for model I and II, especially when thinking of what can already happen in one dimensional systems 
(Beck & Schloegl, 1995). So we shall not discuss within more details this aspect, focusing on grammar 
changes^. 

Indeed, from the point of view of neural network analysis, grammar changes implies changes in the 
type of raster plots that the network is able to produce. An interesting situation occurs when the set 
of admissible raster plots obtained after adaptation belongs to ^^(t) fl J^^(t+i) ■ In this case, adaptation 

^Additional constraints are required ensuring that J"^^ does not tend to —00. Typically, such constraints amount to 
bound the synaptic weights variation (soft or hard-bounds rules) by a suitable choice of functions hiji^^i^^ in eq. (13). 

'^We actually conjecture that the only possible phase transitions in model I and II are grammar changes occurring when 
crossing the set where d{Q,S) = (eq. (6)). 
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plays the role of a selective mechanism where the set of admissible raster plots, viewed as a neural code, 
is gradually reducing, producing after n steps of adaptation a set flJ^^iS^cm) which can be rather small. 
This has been observed by Soula et al. in (Soula et al., 2006) for model I with Spike Time Dependent 
Plasticity, though not analyzed in those terms. If we consider the situation where (1) is a neural network 
submitted to some stimulus, where a raster plot u encodes the spike response to the stimulus, then 
is the set of all possible raster plots encoding this stimulus. Adaptation results in a reduction of the 
possible coding, thus reducing the variability in the possible responses. 

3.6.4 Evolution of the potential. 

Iterating the adaptation process, one expects to have periods of smooth variations, punctuated by sharp 
transitions where potential and grammar change. We qualify them under the generic name of "phase 
transitions" without further specifications. We write r = {l,n) where I indexes the phase transition 
and n the number of epoch since the last phase transition. The adaptation process corresponds now 
to a sequence ■0^^-' = i/?*-''"-' of Gibbs potentials. Thus, -i/j^''^-* is the potential arising just after the l-th 
transition, with the convention that ip^^'°\u)) = — oo if u) is forbidden, such that i/)*-''"^ characterizes the 
grammar after the l-th phase transition. We call a regular period the succession of adaptation epochs 
between two phase transitions. 

By definition, during a regidar period the synaptic update W^'^"'"^^ = W^'^^ + SW^'^^ induces a smooth 
variation of the potential = tp^'^^ + dip^'^\ and the variational principle (43) selects a new Gibbs 

measure 1/^(^+1) with a potential: 

^(r+l) ^^(r) ^^^(r)^ (29) 

It follows that: 

n 
m=0 

where n is the epoch where the Z-th phase transition aroused. The explicit form of 5if}^'^^ 

■0*^^-') depends on the detailed form of the dynamics, and there is little hope to determine it 
except when the adaptation process converges to a solution with SW = (see section 3.5.1). 
to section 3.6.2, the function J^^^ (eq. (24)) decreases during regular periods. 

When a phase transition occurs, the relation (29) does not hold anymore. We now write: 

where ni is the last epoch before the phase transition Z + 1. Stpl^Jg^'^^ contains the regular variations of 

the potential while '^''/'sirig'*'^ contains the singular variations corresponding to a change of grammar. 

We end up with the following picture. During regular periods the grammar does not evolve, the 
potential changes according to (30), (31), and the function J^^^ decreases. Then, there is a sharp change 
in the grammar and the potential. The function T^^^ may also have singularities and may sharply increase. 
If the adaptation rule converges, the potential (30) converges to ip* = $ + A*.0 (eq. (21)). This potential 
$ characterizes the grammar and the statistical weight of allowed raster plots is given by X*.(f). The 
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potential ip* contains all changes in the grammar and statistics arising during the evolution (30,31). 
Thus it contains the history of the system and the evolution of the grammar. 



4 A numerical example. 

4.1 Model. 

4.1.1 Adaptation rule. 

As an example we consider an adaptation rule inspired from (15) with an additional term rdWjj\ 
— 1 < < 0, corresponding to passive LTD. 



SW 



T+T, 



t=T, 



u=-Ts 



where f{x) is given by (16) and with: 



Set 



Ts '= 2 max(T+ , r_ ) . 



(32) 



H = {^fij}f,=i, and: 



Hij{u)) = LOj{0)Si{Lo), 



(33) 



hj{Wij,(j) = rdWij + Hij{ui), 



with cj) — {(t>ij}fj^i, where is a finite range potential with range 2Ts. Then (32) has the form (12), 
^<^=^'^S) I'PijiWij,.)]. 

4.1.2 Effects of the adaptation rule on the synaptic weights distribution. 

The term 5i(a;) (eq. 33) can be either negative, inducing Long Term Depression, or positive inducing 
Long Term Potentiation. In particular, its average with respect to the empirical measure tt^^^, reads: 



where: 



V = 



— Ll-e^- , _j_l-e^+ 
A_e ;- + A+e ^+ - 



1-e 



1-e 



(34) 
(35) 



and where ^^(t) = '''^(x) {^i) is the frequency rate of neuron i in the r-th adaptation epoch. 

The term 77 neither depend on u) nor on r, but only on the adaptation rule parameters A_ , , r_ , t+ , Tg . 
Equation (34) makes explicit 3 regimes. 
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• Cooperative regime. If 77 > then TT^l^{Si) > 0. Then synaptic weights have a tendency to 
become more positive. This corresponds to a cooperative system (Hirsch, 1989). When iterating 
adaptation, dynamics become trivial with neurons firing at each time step or remaining quiescent 
forever. 

• Competitive regime. On the opposite if 77 < synaptic weights become negative. This corre- 
sponds to a competitive system (Hirsch, 1989). 

• Intermediate regime. The intermediate regime corresponds to 77 0. Here no clear cut tendency 
can be distinguished from the average value of Si and spikes correlations have to be considered as 
well. 

4.1.3 Static weights. 

Thanks to the soft bound term rdWij the synaptic adaptation rule admits a static solution given by: 

TT^TA [a;j(0)5i(a;)] , ^ 

Note that this equation can have several solutions. 

Using the same decomposition as the one leading to (34) we obtain: 

n^Jl, k(0)5,(a;)] = A_ J2 ^^^^S) ['^^(0) ^'(«)] + ^+ E ^'^^Z, k,(0) a;,(u)] . 

«=-Ts u=l 

Note that ijJj{0) uii{u) > 0, thus the first term is negative and the second one is positive (see eq. (16)). 
The sign of Wij depend on the parameters A-,A+,Ts, but also on the relative strength of the terms 

4.1.4 Convergence to the static weights solution. 

The synaptic adaptation rule (32) defines a mapping Q on the set of synaptic weights; 
with |1 + erd\ < 1, where Tr^'f^j depends on W^'^\ Thus, 



^ = (l + er.)^,,,. + e^^, 

where Sij^ki = liii = k,j = l and otherwise. The second term is a linear response characterizing 
the variation of tt^^) [Hij] with respect to small variations of Wki- Approximating t^^(') by a Gibbs 
distribution with a potential it writes C^(^) j^..{0) + 2 X)(=o C'^Ct),//^^- (i) where C^(t) fj..{t) is the 

time-t correlation function between and Hij . When v^(r) is smooth with respect to W, the derivative 
is dominated by the first term and Q is contracting for sufficiently small e. This is however not a sufficient 
condition for the convergence of (32) because Q is not continuous everywhere. It has jumps whenever 7 
crosses the boundary of a structural stability domain. 
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If the static solution is contained in a open ball where Q is continuous the contraction property ensures 
the convergence to a static solution for any initial condition in this ball (Brouwer theorem). 
In this paper we postulate the convergence and check it numerically. 



4.1.5 Spike train statistics in a static w^eights regime. 

As emphasized in section 2.6.5 and 3.5.1, when the synaptic adaptation rule converges to a fixed point, 
the corresponding statistical model is a Gibbs measure with a potential: 

^* = <P + X*.ct), 

where $ contains the grammar and A are free statistical parameters. The value A* of these parameters 
in the potential ip* is determined by the relation: 



dP 



dXij 



= rdW*j+ty^'[Hij]=0,yi,j, (37) 

X* 



where the pressure is given by: 



P [V] = TdA.W + ^lim i log e^.STHM_ 

This procedure provides us the explicit form of the raster plot probability distribution when the 

adaptation rule converges. But the price to pay is that we have to determine simuUaneously the N'^ 
parameters X*j on which the Gibbs measure depends. Focusing on the joint probability of a small set of 
neurons (pairs, triplets) this constraint can be relaxed in order to be numerically tractable. 



4.2 Numerical checks. 

The main goal of section 4 is to provide an example of the theoretical concepts developed in this paper. 
The emphasis is however not put on numerical results which will be the main topic of a forthcoming paper. 
Therefore, we focus here on numerical simulations for model I only, since the simulations for model II 
are quite more computer-time consuming, and we consider only one case of r] value corresponding to the 
intermediate regime defined above. 



4.2.1 Implementation. 

Neurons dynamics. Previous numerical explorations have shown that a Model I- network of N neu- 
rons, with synapses taken randomly from a distribution A/'(0, ^), where C is a control parameter, exhibits 
a dynamics with very large periods in specific regions of values of the space {p, C) (Cessac, 2008; Cessac 
& Vievillc, 2008). On this basis, we choose N = 100, p = 0.95, C = 4.0, N = 100. The external current 
liext) jg giyg]2 by /ea:t _ Q Qg _|_ o.017V(0, 1). Notc that fixing a sufiiciently large average value 

for this current avoids a situation where neurons stops firing after a certain time ("neural death"). 
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STDP implementation. An cfRcicnt implementation of a STDP rule docs not only depend on the 
analytic form of the rule but also on the respective time scales characterizing neurons and synapses 
evolutions. In "on-line-protocols" , where the synaptic changes occur at the same time scale as neurons 
dynamics the so-called recursive approach appears to be the more appropriated (see (Zou, 2006) and 
references therein) . In our case, we use instead an offline protocol, where we register the dynamics of the 
system on a long time windows, then compute the STDP modification, then let the system evolve again 
to its attractor (see section 3). The offline protocols are very expensive in machine-time, especially when 

(T) 

using long spike trains to get a reliable time average (7r^(/) (0ij)). For this reason we need to add some 
words about our implementation. 

We register spike trains in a binary code. Indeed, this is the cheapest way in memory requirements, 
though it might be expensive for accessing specific spike times. Also, bit- wise operations are faster 
than their equivalents on other types of data. Finally, there exist very fast methods for c;ompnting the 
number of bits on any specific variable of type of integer (The faster for large number of iterations is 
a look-up table of precomputed values, but in-line methods - using parallel methods based on masks- 
are not so far in terms of performances. For details and speed comparison see the G. Manku website 
http://infolab.stanford.edu/ manku/bitcount/bitcount.html). Numerical comparison of this method with 
the direct one that records the dynamics in Boolean arrays and compute STDP spike by spike shows 
enormous performance difference growing exponentially as the length of trains increases. 

Computation of S^'^^J^^. To check the result in section 3.6.2, stating that J^^\w) is a decreasing 
function of r (i.e. S^'^^J^^, is negative) during regular periods, we use the following method (Chazottes, 
1999). 

Fix a potential ip. If T is the length of the experimental raster plot, one divides it into k blocs of 
length n, such that T = k x n. Call: 

= 1 log exp(5f ^(a;)) j , (38) 

with Sj"\u!) = J2iZ^n~^ ip{a^Lo). Then, the following result can be proved (Chazottes, 1999): 

P[il)] = lim lim Pn,k{oj), (39) 

for almost-every w. This computation requires the knowledge of xj). 

The result in (Chazottes, 1999) can be extended straightforwardly to the following case. If ipi,ip2 
two potentials for the same grammar, and Sip = tp2 — V'l then: 



. fc-l jn+n-l . 

P[^2]-P[^i]= lim lim -log -Vexp( V SiP{a'co))] (40) 
n— >+oo fe— >+oo n \ k ^-^ ^-^ I 

\ j=0 t=jn 



P 



+^>V(^).0(>V(-))J 

fe— 1 jn+ri— 1 

j=0 t=in 



Since = P 

5W^, = -^liin^^liin^ilog(i|:exp(^ E ^W^.^OV^, <^*(^(^^))) ) (41) 
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Expanding this equation in power series of e one recovers the expansion (26). In particular, sub- 
tracting to (41) the leading term (of order SW'^'^KdW^'^^) one gets an estimation of the "linear response" 
term k-^^^ in eq. (27). However, on practical grounds, the computation of (41) requires very long 
time series to reduce significantly the finite fluctuations of the empirical average. Similar problems are 
encountered in the computation of linear response from other methods (Cessac, 2007). 

This result holds for any adaptation rule (any polynomial <p). In the case of the adaptation rule (32), 
this reduces to computing: 

/, fc-l jn+n-l \ 

<5H^^ = -frfJwM.wM _ ^lim^^lim - log - ^exp ( ^ eSW^^\U{w'-^\a\(j^^'^))^ . 
4.2.2 Simulation results. 

We have run simulations for STDP parameters values: = 0.99, e = 0.001, t+ = 16, t_ = 32, Aj^ = 
1.0, and Tg = 64, corresponding to standard values (Izhikevich & Desai, 2003). We fix rj = —0.01 
corresponding to the intermediate regime discussed in section 4.1.2. This fixes the value of via eq. 
(35). The length of spike trains is T = 3941 = 4096 - 128 = 128 x 32 - 2 x T^, where 32 is the number 
of bits in long integer, in our (machine-dependent) implementation. An extended description will be 
published elsewhere. The main results are summarized in fig. 3. 

In this figure (Top) we have represented the evolution of the distribution of synaptic weig hts >V(^) 
(see captions). For this value of r/, the histogram evolves to a unimodal distribution with some nodes 
having strong synaptic weights. In fig. 3 (center) we depict the raster plots at the beginning and at the 
end of synaptic adaptation process. In fig. 3 (bottom-left) we have shown the evolution of the Frobenius 
norm for ^W^"^^ (i.e, X^^- |<5W{j'|^), which converges to 0. This shows the convergence of the rule in this 
case. We have also plotted the average weights modification Y^ij SWij . Finally in fig. 3 (bottom 
-right) is represented computed with 512 block of size 512. We see clearly that this quantity is 

negative as expected from eq. (28) and tends to 0. We have also represented the term —^6W^'^\5W'^'^^ 
corresponding to the first order expansion in eq. (28). 

5 Conclusion. 

In this paper, we have considered the questions of characterizing the statistics of spike trains generated 
by neuronal networks, and how these statistics evolve during synaptic adaptation, from the angle of 
dynamical systems theory and ergodic theory. We have introduced a framework where notions such as 
raster plots, empirical averages of observables, and synaptic plasticity rules, coming from neuroscience, 
can be formalized in the language of dynamical systems. In this context, we propose a strategy to 
produce optimal statistical models of spike trains, based on the maximization of statistical entropy where 
expectation of observables must be compatible with their empirical averages. These models are Gibbs 
measures. Considering adaptation rules with slow dynamics, we also suggest that such Gibbs measures 
may result from related adaptation rules whenever synaptic weights converge to a fix value. In this case, 
synaptic weights resume the whole history of the neuronal network contained in the structure of the 
generating function of cumulants (the topological pressure) and in the structure of allowed/forbidden 
raster plots (grammar). 
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Figure 3: (color on line). The parameters are = 100, r^^ = —0.4, rj = —0.01, e = 0.01. Top:(left) Histogram in log scale 
of synaptic connectivity in the network after 1000 epochs of STDP;(right) map of strongest connections, either inhibitory 
(pink dots) or excitatory (red diamonds) (i.e, \Wij\ > 0.03). We also include at the bottom of this map the external current 
of each neuron (multiplied by 10). Middle: (left) A comparison of raster plots at the beginning and (right) the end of the 



synapses evolution. Bottom: (left) Evolution of the mean synaptic change 



(red) and the Frobenius norm 



for STDP matrix on log scale i.e, logj^Q [^i'j l^'^ij^'l'^j (green); (right) Variations of Topological Pressure (with 2 zoom 
graphics showing sharp changes) after each epoch as calculated explicitly in eq. (41) with 512 trains of size 512 (red), and 
first order term approximated by -(5W^^^ .iSW''^' (green). 



To our opinion this theoretical paper is a beginning. Our hope is now to apply this strategy to data 
coming from biological neuronal networks, while only formal neural networks were considered here. At 
the present stage there is indeed a crucial issue to be able to characterize spike train statistics and to 
discriminate statistical models. For example, it is still an open problem to decide, for a given spike train 
in a given experiment, whether firing rate are sufficient for the statistical characterization (leading to 
uncorrelated Poisson models) or if higher order orders statistics such as correlations are also necessary. 
This is more than an academic question. Beyond the choice of such or such statistical model, there are 
hypotheses on how neuronal network convert stimuli from the external word into an action. Discriminating 
statistical models is the way to select the best hypothesis. 

Our approach opens up the possibility of producing numerical methods for such a discrimination, 
e.g. based on KuUback-Leibler divergence minimization obtained from eq. (9). Some of these methods 
are already available as open source code on http://enas.gforge.inria.fr/. They are based on the theory 
developed in the present paper, relying itself on crgodic theory and thermodynamic formalism. At the 
present stage, the overwhelming richness of thermodynamic formalism has not really been considered in 
neuroscience. We hope that this paper will attract the attention of the neuroscience community on this 
point. 
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6 Appendix 



In this appendix we present the thermodynamic formahsm material used in the paper. Though all 
quantities defined below depend on 7, wc have dropped this dependence, which is not central here, in 
order to alleviate notations. We follow (Keller, 1998; Parry & PoUicott, 1990; Chazottcs et al., 1998; 
Chazottes, 1999; Chazottes & Keller, 2009) in the presentation of Gibbs measures in ergodic theory. In 
accordance with the body of the text E is the set of admissible raster plots and we denote by B the 
related Borel sigma-field. 

6.1 Ergodic measures. 

Fix ^ a continuous function. The time average of ^ along the orbit of X is given by 



If jj is an invariant measure^, then, according to Birkhoff theorem, this limit exists for /it-almost every 
X. Standard theorems in ergodic theory ensure that (1) has at least one invariant measure (Katok & 
Hasselblatt, 1998) but there are typically many. Among them, ergodic measures play a distinguished role. 
An invariant measure jj, is ergodic if any real measurable invariant function is /U- almost surely constant. 
As a corollary of Birkhoff 's theorem 7rx(^) = IJ-i^), for /i-almost every X, where /i((/)) = J ^dfi is the 
expectation of (f> with respect to /i. Namely, the empirical average 7rx((/>) is equal to the "ensemble" 
average /x(0) whatever the initial condition, provided it is chosen in the support of /x. Any invariant 
measure can be written as a convex decomposition of ergodic measures. 

6.2 Raster plots statistics. 

Fix n > 0, a set of times ti, . . . t„, and a set of prescribed spiking patterns u}{ti) . . .u{tn)- The set of 
raster plots e E such that u)'{tk) ~ io{tk),k = 1 . . .n contains therefore raster plots where the spiking 
patterns at prescribed times ti . . .tn are imposed (cylinder set). By (countable) intersection and union of 
cylinder sets one can generate all possible events in E, such as "neuron i is firing at time fi", or "neuron 

ii is firing at time ti, and neuron 12 is firing at time t2, ... and neuron i„ is firing at time tn" , etc 

The statistical properties of raster plots are inherited from the statistical properties of orbits of (1), 
via the correspondence i/[C] = /u[{X ^ ui,lu € C}], where C is a cylinder set. On practical grounds, 
these statistical properties are obtained by empirical average. The probability of a cylinder set C is given 



t=i 

where x is the indicatrix function, while the time average of some function ^ : E — » R is given by: 




t=o 



by: 




(42) 



t=i 



8 



'Namely n(P-f ^A) = F~f{A) for any measurable set A C B. 
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When V is ergodic the following holds. 

v = 'K^ lim TT^^^.)' 

1 — *oo 

for z^-almost every w, where 7rt^,(.) is called the empirical measure for the raster plot w. 

6.3 Thermodynamic formalism. 
6.3.1 Working assumptions. 

In this paper, we have adopted a pragmatic approach based on the idea that statistical models obtained 
via the applications of principles in thermodynamic formalism (which arc basically the same as those 
used in statistical physics) could constitute good prototypes for the analysis of real spike trains statistics. 
In this way, our goal is not to prove theorems but instead to use already proved theorems. However, 
theorems have hypotheses that wc now explicit and whose plausible validity is discussed. 

Basically, to apply the machinery described below we need first to assume that raster plots constitute 
a symbolic coding for the orbits of (1), i.e. there is a one-to-one correspondence between membrane 
potential trajectories and raster plots, except for a negligible subset of initial conditions. Wc also assume 
that symbolic trajectories (raster plots) are generated by a finite grammar. This hypothesis is discussed 
in section 2.6.1, and it can actually be rigorously established for models I and II without noise. The case 
with noise is under current investigations. The corresponding transition matrix G defines a sub-shift of 
finite type or topological Markov chain. Actually, part of the results presented below are extensions of 
standard results on Markov chains. We also assumed that G is primitive i.e. 3no > such that Vi,j, 
Vn > no {G"')ij > 0. This amounts to assuming that the dynamics on each attractor is topologically 
mixing^. 

All these hypotheses hold if the dynamical system generating the raster plots is continuous and 
uniformly hyperbolic. Following the definition of (Parry & Pollicott, 1990), a map F : A1 — *■ is 
uniformly hyperbolic on a set A if: 

1. There exists a splitting of the tangent space in unstable (i?") and stable (E^) subspaces such that 
there exists C,fi>0 with ||£)F*||i5., ||DF-*||i3. < C/x*, t > 0. 

2. A contains a dense orbit. 

3. The periodic orbits in A arc dense (and A consists of more than a single closed orbit). 

4. There exists an open set f/ D A such that A = flt^-oo 

Adding noise to model I, II renders them uniformly hyperbolic (see footnote 4 in the text) but they 
are not continuous due to the singularity of the firing threshold. Hence, additional work is necessary 
to prove that there is a finite Markov partition (Cessac & Fernandez, in preparation). Throughout this 
Appendix we assume that all these hypotheses hold. 

"A dynamical system (F, Ai) is topologically mixing if for any pair of open sets U,V C B, there exists no > such that, 

Vn>no, F"C/ny^0 
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6.3.2 Potentials. 



Fix < 8 < 1. Wc define a metric on S, the set of raster plots generated by the grammar G, by 
de(w,w') = B^, where p is the largest integer such that uj(t) = uj(f'),0 < t < p — 1. (The related 
topology thus structures the fact that raster plots coincide until t = p — 1). 
For a function ■0 : S — > R and n> 1 define 

varnip = sup{|i/'(w) — fpi^^')] '■ <^{t) = u}'{t),0 <i<n}. 

We denote by C(E) the space of functions tp such that uar„i/' ^ as n ^ oo. This is the set of continuous 
real functions on S. 

A function tp e C'(E) is called a potential. A potential is regular^'^ if J2n=o '^^''~-n{'4') < oo- Equiva- 
lently, tp is Holder. For a positive integer r a range-r potential is a potential such that xP{lo) = xpluj'), 
if U3{t) = u}'{t),0 < t < r. That is, tp depends only on the r first spiking patterns w(0), . . . ,uj{r — 1). 
Thus, a finite range potential is necessarily regular. Examples are uJi^ (0) (range-1 potential), w,;^ (Ojuii.^ (fc) 
(range-fc potential), uJi-^{0)LUi.-^{k)uJi.^{l) (range-max(A:, I) potential), etc, .... The potential 0,;,- introduced 
in section 3.2 is a range-Tg potential. More generally, all potential introduced in the paper have finite 
range. 

A specific example of potential, used in this paper, corresponds to integrating the grammar into a 
potential $ such that $(a;) = is w is allowed, and $(a;) = — oo otherwise. In this case, however, the 
potential is not continuous anymore but only upper semi-continuous. 



6.3.3 Equilibrium states 

Let be a a-invariant measure and E^") the set of admissible cylinders of lenght n. Call: 
Then, 

h liy] — lim 



n 

is the entropy of v. Let m^*"") be the set of invariant measures for a, then the pressure of a potential tp 
is defined by: 

P[ip]= sup {h[i^] + u[tp]). (43) 

This supremum is attained- not necessarily at a unique measure- where: 

P[ip] ^ h [v-4,] + v-4, [ip] . 

is called an equilibrium state, as it maximises some version of the (neg) free energy. Equivalently, a 
measure u is an equilibrium state for a potential ip G C'(S) if and only if, for all potential 77 e C{T,): 

P[ip + ri\-P[tp]>u [ip] (44) 

This means that is a tangent functional for P at ip. In particular ip has a unique equilibrium state if 
and only if P is differentiable at ip, i.e. lim^^o j {P[''P + t(p] — P[ip] = y{ip)), for all (p € C(T,). 

'^"This condition is analogous to the potential decay ensuring the existence of a thermodynamic limit in statistical 
mechanics (Ruelle, 1969; Meyer, 1980). In the present context it ensures that the Gibbs measure related to the potential i/) 
is unique and that it is also the unique equilibrium state. 
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6.3.4 Properties of the topological pressure. 



Wc review some important properties of the topological pressure used in this paper. Basically, the 
topological pressure has the same properties of thermodynamic potentials like free energy. First, 



P [tp] = limsup - log{Zn{ip)), 



where Zn{tl^) is a partition function: 



and 



Also, the topological pressure is a convex functional of the potentials i.e. 

P [aip^ + (1 - a)ip2] < aP [t/^i] + (1 - a)P [^/^a] ; a G [0, 1] 



(45) 



(46) 



(47) 



(48) 



Like the free energy, the topological pressure is a generating functional (when it is differentiable). For 
example. 



dP [tp + a(p] 



da 



a=0 



and: 



d'^P[tjj + ai(pj^+a2(p2] 



da\da2 



ai=a2=0 



(49) 



(50) 



t=o 



where: 



C^^^^{t) = {(pi 0(7*02) - ^'i/'(0l)^'i/'(02)) 

is the correlation function of (pi, (j)2 at time t. 
6.3.5 Gibbs measure. 

Assume that the transition matrix of the shift a is primitive and that ip is regular. Then, there is a 
unique ergodic measure , called a Gibbs measure, for which one can find some constants ci , C2 with 
< ci < 1 < C2 such that for all n > 1 and for all u G'E: 



ci < 



<C2. 



exp(-nP [tp] + S^"-^tp{u))) 
where S^'^^ipiu) is given by (47). is also the unique equilibrium state. 



(51) 
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6.3.6 Ruelle-Perron-Probenius operator. 



The Ruelle-Perron-Frobcnius (RPF) operator for the potential tp, denoted by L^p, acts on functions 
g £ C(S) as L^g{uj) = e^'-^ ^ giuo'). This operator has a unique maximal eigenvalue s = 

gP[i/'] associated to a right cigcnfunction and a left cigcnfunction (probability measure) such that 
L^b^{ijj) ~ sb^{u!) and J L^vdp^ ^ ■'^ J '^d,p^, for all v G C(S). The remaining part of the spectrum is 
located in a disk in the complex plane, of radius strictly lower than s. Finally, for all v e C{T,) 




The Gibbs measure is = b^p^. 

Given ij) and knowing its right eigenfunction b^ and pressure P [tp] one can construct a new potential: 

^{u) = -0(c^) - \0g{b^{<TUj)) + \og{b^{Lu)) - P [-0] , (52) 

such that L*l = 1, where 1 is the constant function l(w) = 1. Such a potential is called "normalised". 
Its pressure is zero. 

If ■0 has a finite range-r then the RPF operator reduces to the transition matrix of a (r + l)-step 
Markov chain. Thus, b^ and p^ can be easily determined. The Gibbs measure is none other than the 
invariant mesure of this chain. This can be used to generate a raster plot distributed according to a 
Gibbs distribution with a potential ■0. Moreover, for i/.0-almost-every raster plot w: 

TT^T^ [a;(l)...w(r)] 
This allows to obtain of tp numerically (Chazottes et al., 1998). 



6.3.7 KuUack-Leibler divergence. 

There is a last important property. Let p, be an invariant measure and v.^ a Gibbs measure with a 
potential ■0, both defined on the same set of sequences S. Let 



d(/i,z/^) = limsup- ^ /i ([a;]o log 



(Mo,n-l) 



(Mo,n-l). 

be the relative entropy (or Kullack-Leibler divergence) between p and v. Then, 

d {p, v^) = P [t/j] - p{'tl}) - h{p). 



(54) 



(55) 



li p = V, d{p, v) = 0. The converse may not be true if the potential is not regular. 

If a raster u is typical for the Gibbs measure u.^ then one expects that tt^T^ becomes closer to i/i/, than 

any other Gibbs measure (for another potential xp ) as T growths. This provides a criterion to compare 
two Gibbs measures (and to discrimate between several statistical models) . Indeed, the following theorem 
holds (Chazottes et al., 1998). 
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Theorem 2 For any pair of distinct^^ regular potentials 0, tp, there exists an integer N = N{(f), ip, w) 
such that, for all T > N, 

d(T:^J\v^)<d(Ti'^J\v^) (56) 

for v^-almost every uj. 

This says that d (^ir^^ , j becomes, for sufficiently large T, smaller than the KuUback-Leibler diver- 
gence between and any other Gibbs measure. 

^^Non cohomologous. 
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